Last updated: 2021-02-24
Checks: 7 0
Knit directory: Supplemenetary_data_files_automation_v2/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20201119) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version d11a11c. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/Visual_provincial_view/
Ignored: code/Construction/
Ignored: data/Inputs/Large_tables/
Untracked files:
Untracked: DELETE.jpeg
Untracked: Rplot.jpeg
Untracked: analysis/graphing_heierachies.rmd
Untracked: delete.rds
Untracked: delete/
Untracked: output/CPI_18100005.csv
Untracked: output/Population_17-10-0005-01.csv
Untracked: output/Provincial_GDP_36-10-0402.csv
Untracked: output/Trade_Monthly12-10-0119-01.csv
Untracked: output/annual_trade_table_12-10-0119-01.csv
Untracked: output/grain_production 32-10-0351-01.csv
Untracked: output/output.7z
Untracked: output/provincial_view.csv
Untracked: test.html
Unstaged changes:
Modified: analysis/SUT_presentation_RY2018.Rmd
Modified: code/Publishing_script.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Provincial_view.rmd) and HTML (docs/Provincial_view.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.
| File | Version | Author | Date | Message |
|---|---|---|---|---|
| Rmd | d11a11c | arman | 2021-02-24 | wflow_publish(c(“analysis/Provincial_view.rmd”, “analysis/index.Rmd”)) |
| html | c19cc63 | arman | 2021-02-19 | Build site. |
| Rmd | 78d8bfe | arman | 2021-02-19 | wflow_publish(c(“analysis/Provincial_view.rmd”, “analysis/index.Rmd”)) |
| html | f3e9100 | arman | 2021-02-19 | Build site. |
| Rmd | 3923202 | arman | 2021-02-19 | wflow_publish(c(“analysis/Provincial_view.rmd”, “analysis/index.Rmd”)) |
| Rmd | 4e3c230 | arman | 2021-02-19 | added seph and completed the table fully |
| Rmd | 56dc30d | arman | 2021-02-19 | fixed population table |
| html | a942ba2 | arman | 2021-02-19 | Build site. |
| Rmd | 90d0714 | arman | 2021-02-19 | this is a big update; fixed the toc so it shows at the beginning and is more interactive. |
| html | fdfae2d | arman | 2021-02-18 | Build site. |
| html | 280e930 | arman | 2021-02-18 | Build site. |
| Rmd | 940c017 | arman | 2021-02-18 | wflow_publish(c(“analysis/Provincial_view.rmd”, “analysis/index.Rmd”)) |
| html | 3a37dab | arman | 2021-02-18 | Build site. |
| Rmd | fcecdd6 | arman | 2021-02-18 | wflow_publish(c(“analysis/Provincial_view.rmd”, “analysis/index.Rmd”)) |
| Rmd | baaed92 | arman | 2021-02-18 | Completed hours worked. |
| Rmd | 93e5a48 | arman | 2021-02-17 | added LFS hours worked and fixed the wordings of other categories to make it more clear |
| Rmd | 73e6f48 | arman | 2021-02-15 | started adding capital and repair expenditures, |
| Rmd | 7c1c5b1 | arman | 2021-02-15 | added a way to pivot a value from SEPH in order to perserve the status symbol alongside the actual value. |
| Rmd | 91a1447 | arman | 2021-02-14 | added a few tables by seph re oragnized the document to better reflect the mental model of the sources |
| Rmd | 2828d5b | arman | 2021-02-13 | added the link |
| Rmd | 86c11c9 | arman | 2021-02-12 | addded a way to save tables as a standalone for easy transfer and work |
| html | c760d5f | arman | 2021-02-12 | Build site. |
| Rmd | 4b024d4 | arman | 2021-02-12 | wflow_publish(c(“analysis/Provincial_view.rmd”, “analysis/index.Rmd”)) |
| Rmd | a7c1cea | arman | 2021-02-12 | ADDED THE STYLE COLOR for import/export |
| Rmd | 03af6b7 | arman | 2021-02-12 | added industrial capacity |
| html | 9bec129 | arman | 2021-02-10 | Build site. |
| Rmd | cc2129d | arman | 2021-02-10 | fixed the names in the final table so that it will show better in the final table |
| html | 836b3f7 | arman | 2021-02-10 | Build site. |
| Rmd | db943d4 | arman | 2021-02-10 | added functions to create a csv output for every file |
| html | d20cf2d | arman | 2021-02-08 | Build site. |
| Rmd | 555b1a2 | arman | 2021-02-08 | wflow_publish(c(“analysis/Provincial_view.rmd”, “analysis/index.Rmd”)) |
| html | acae2de | arman | 2021-02-08 | Build site. |
| Rmd | 5b9cbf6 | arman | 2021-02-08 | fixed the date |
| html | c978ec7 | arman | 2021-02-08 | Build site. |
| html | 210b49e | arman | 2021-02-08 | Build site. |
| Rmd | 39b2d18 | arman | 2021-02-08 | wflow_publish(c(“analysis/Provincial_view.rmd”, “analysis/index.Rmd”)) |
| Rmd | 05f2e30 | arman | 2021-02-08 | Finished LFS |
| Rmd | 03e788f | arman | 2021-02-07 | fixed the employment |
| Rmd | 07cced5 | arman | 2021-02-05 | added employment by industury |
| html | 3077a6a | arman | 2021-02-04 | Build site. |
| Rmd | 3a7a11e | arman | 2021-02-04 | wflow_publish(c(“analysis/Provincial_view.rmd”, “analysis/index.Rmd”)) |
| html | 0276906 | arman | 2021-02-04 | Build site. |
| Rmd | 8fdc2eb | arman | 2021-02-04 | added CPI with goods services and all items |
| Rmd | 41617a0 | arman | 2021-02-04 | ADDED the tables for prices/agriculture |
| html | f449255 | arman | 2021-02-03 | Build site. |
| Rmd | 247d0fd | arman | 2021-02-03 | wflow_publish(c(“analysis/Provincial_view.rmd”, “analysis/index.Rmd”)) |
| Rmd | 3db0059 | arman | 2021-02-03 | added back the provinces to filter construction on; created and used across to add _K to all the columns and also to create summary annual tables easier |
| Rmd | 88a4612 | arman | 2021-02-02 | added a simple version of construction; will need some refinement. construction is rather complicated |
| Rmd | a105189 | arman | 2021-02-01 | addded Immigration and a method to cacluate any tibbles first difference and percent change |
| html | 6a066ff | arman | 2021-01-29 | Build site. |
| html | ad426b3 | arman | 2021-01-29 | Build site. |
| Rmd | aa8d76b | arman | 2021-01-29 | wflow_publish(c(“analysis/Provincial_view.rmd”, “analysis/index.Rmd”)) |
| Rmd | ab73d32 | arman | 2021-01-29 | Completed the master table for Trade population and PGDP |
| Rmd | 777e601 | arman | 2021-01-28 | created an annualized trade table. |
| Rmd | dcf20f4 | arman | 2021-01-27 | finished monthly trade with YOY % change. |
| Rmd | 974fcc7 | arman | 2021-01-26 | now measuring year over year average; |
| Rmd | 3deba0d | arman | 2021-01-26 | NDM has no consistent way of displaying dates; each table needs to be parsed in a custom manner. in this commit I use the parse date function in tidyverse to create a custom parse function for the dates in the trade table. hopefully I can use this to parse more |
| html | ffc28e9 | arman | 2021-01-22 | Build site. |
| Rmd | d13d74f | arman | 2021-01-22 | scraped the params and replaced a piece of r code; got the left join to work and merged the PGDP and population table |
| Rmd | 704ee1d | arman | 2021-01-21 | fuxed the developer_mode |
| Rmd | 76913f8 | arman | 2021-01-21 | testing new creds |
| Rmd | aaaadf9 | arman | 2021-01-21 | set up the first table. it all works |
| Rmd | 2ae6621 | arman | 2021-01-20 | added tables and the developer mode for including stuff in the document |
| Rmd | edeeade | arman | 2021-01-19 | ported the fucntions and set up added code lists and functions to be used for the project. |
| html | ac1cbff | arman | 2021-01-19 | Build site. |
| html | 1771361 | arman | 2021-01-19 | Build site. |
| html | 49fbbde | arman | 2021-01-19 | Build site. |
| Rmd | 8538afc | arman | 2021-01-19 | starting the official provincial view |
in this document I will create a view based on the following tables
contribution to total economy
population 17-10-0009-01.
immigration 17-10-0008-01.
labor force Statistics Canada, table 14-10-0327-01.
Farms, by operation type 32-10-0403-01.
Aquaculture in Canada 32-10-0107-01.
Manufacturing industries 16-10-0117-01.
International merchandise trade by province, commodity, and Principal Trading Partners 12-10-0119-01
Investment in Building Construction(monthly) 34-10-0175-01
Industrial capacity utilization rates, by industry 16-10-0109-01
Building permits, by type of structure and type of work 34-10-0066-01
Employment and average weekly earnings (including overtime) for all employees by province and territory, monthly, seasonally adjusted 14-10-0223-01
in this section I will set up the foundation for the program; it is broke up into 3 parts; package dependency and path; functions,and finally code lists.
in this section I will manage dependencies ( the packages used in this project; in R you need to state what you will use (kinda) and here we will simply state them) these will have to be installed for the program to run properly. if you run this in the cloud the commands below should be sufficient. if you uncomment the first 5 lines
all the files in this program will now be created relative to where you run this script; it will create all the necessary subflorder and files from the location below
here()
[1] "C:/Users/Arman/Documents/Statcan/R_projects/Supplemenetary_data_files_automation_v2"
stating what the code lists are for the project to sort and query data from; if you want to add some thing for the table to include you can simple add it here and it should work;. it is designed this way to make this program more extendable, and maintainable. the added complexity is worth it
if you have any tables to add please add it here; it will make it easier to maintain this program
here I will describe the functions required in the project. the most important one is the one for creating interactive tables in the browser that allow you to change which columns to show and to search through tables.
it is important for the search function and performance for the columns to be categories and/or numerics.
TODO: find a way to handle sig dig
i will attempt to import and clean all the tables ### PGDP here I will be working with Provincial GDP
here I will include the tables by IAD
pgdp_table_number <- "36-10-0402"
pgdp <- cansim::get_cansim_ndm(pgdp_table_number)
Accessing CANSIM NDM product 36-10-0402 from Statistics Canada
Parsing data
Folding in metadata
pgdp <- janitor::clean_names(pgdp)
pgdp %>%
select(ref_date, geo, north_american_industry_classification_system_naics, value, hierarchy_for_north_american_industry_classification_system_naics, coordinate, classification_code_for_north_american_industry_classification_system_naics, value_2) %>%
mutate(ref_date = lubridate::make_date(ref_date)) %>%
rename(CPC = value_2) %>%
filter(ref_date > from_this_year) %>%
filter(value == "Contributions to percent change") %>%
select(!value) %>%
filter(classification_code_for_north_american_industry_classification_system_naics %in% c(two_digit_naics_code)) %>%
group_by(ref_date, north_american_industry_classification_system_naics, geo) -> clean_PGDP
# maybe wait till the end to convert the tables to save some space
interactive_table(clean_PGDP)
# data.table::fwrite(here("output", str_glue("Table_{tablenumber}_CPI_by_province_product_groups_{params$from_this_year}-{params$to_this_year}")))
pgdp_table_number <- "36-10-0402"
pgdp <- cansim::get_cansim_ndm(pgdp_table_number)
Reading CANSIM NDM product 36-10-0402 from cache.
pgdp <- janitor::clean_names(pgdp)
pgdp %>%
select(ref_date, geo, north_american_industry_classification_system_naics, value, value_2) %>%
mutate(ref_date = lubridate::make_date(ref_date)) %>%
pivot_wider(names_from = value, values_from = value_2) %>% filter(ref_date > from_this_year) -> clean_PGDP
clean_PGDP %>% data.table::fwrite(here::here("output", str_glue("Provincial_GDP_36-10-0402.csv")))
# maybe wait till the end to convert the tables to save some space
pgdp %>%
select(ref_date, geo, north_american_industry_classification_system_naics, value, value_2) %>%
mutate(ref_date = lubridate::make_date(ref_date)) %>%
rename(CPC = value_2) %>%
filter(ref_date > from_this_year) %>%
filter(value == "Contributions to percent change") %>%
select(!value) %>%
filter(north_american_industry_classification_system_naics == "All industries [T001]") %>%
pivot_wider(names_from = north_american_industry_classification_system_naics, values_from = CPC) %>%
janitor::clean_names() %>%
rename(CPC_All_industries = all_industries_t001) -> master_pgdp
add MGDP
population <- cansim::get_cansim_ndm(population_table_number)
Accessing CANSIM NDM product 17-10-0005 from Statistics Canada
Parsing data
Folding in metadata
population %>%
janitor::clean_names() %>%
select(1, 2, 4, 5, 12) %>%
rename(population = value) %>%
mutate(ref_date = lubridate::make_date(ref_date)) %>%
filter(ref_date > from_this_year) %>%
filter(age_group == "All ages") %>%
filter(sex == "Both sexes") %>%
select(1, 2, 5) %>%
mutate(across(where(is.character), as.factor)) %>%
arrange(geo, ref_date) %>%
group_by(geo) %>%
mutate(across(where(is.numeric), fd_pc, .names = "{.fn}.{.col}")) %>%
mutate(across(where(is.numeric), round, 2)) -> clean_population
# view(dfSummary(clean_population))
interactive_table(clean_population)
clean_population %>%
data.table::fwrite(here::here("output", str_glue("Population_{population_table_number}.csv")))
34-10-0035-01
im not sure where this fits in
This is the YOY percentage change difference for the monthly trade by NAPC I will create the annual series tomorrow. simply sum it over the year. it seems to be what the chief economist did and also it makes sense since it is already seasonally adjusted
trade <- cansim::get_cansim_ndm(trade_table_number)
Accessing CANSIM NDM product 12-10-0119 from Statistics Canada
Parsing data
Folding in metadata
trade <- janitor::clean_names(trade)
trade %>%
select(ref_date, geo, trade, north_american_product_classification_system_napcs, principal_trading_partners, value) %>%
mutate(ref_date = parse_date(ref_date, "%Y-%m")) %>%
filter(ref_date > from_this_year) %>%
pivot_wider(names_from = trade, values_from = value) %>%
janitor::clean_names() -> clean_trade
clean_trade %>%
arrange(north_american_product_classification_system_napcs, principal_trading_partners, geo, ref_date) %>%
group_by(north_american_product_classification_system_napcs, principal_trading_partners, geo) %>%
mutate(yoy_pct_change_import = ((import / lag(import, n = 12, order_by = ref_date)) * 100) - 100) %>%
mutate(yoy_first_dif_import = (import - lag(import, n = 12, order_by = ref_date))) %>%
mutate(yoy_pct_change_export = ((domestic_export / lag(domestic_export, n = 12, order_by = ref_date)) * 100) - 100) %>%
mutate(yoy_first_dif_export = (domestic_export - lag(domestic_export, n = 12, order_by = ref_date))) %>% data.table::fwrite(here::here("output", str_glue("Trade_Monthly{trade_table_number}.csv")))
# group_by(3,4,2) %>%
#data.table::fwrite(here::here("delete", str_glue("Table_{trade_table_number}_Trade_full_table_from{from_this_year}.csv")))
below is a sample graph
clean_trade %>%
filter(geo %in% c("Quebec", "Ontario", "Manitoba", "Alberta", "British Columbia"
)) %>%
filter(principal_trading_partners %in% c("All countries", "United States",
"China")) %>%
filter(north_american_product_classification_system_napcs != "Total of all merchandise") %>%
ggplot() +
aes(x = ref_date, y = domestic_export, colour = north_american_product_classification_system_napcs) +
geom_line(size = 0.7) +
scale_color_hue() +
theme_minimal() +
facet_grid(vars(geo), vars(principal_trading_partners)) -> P
plotly::ggplotly(P,dynamicTicks = TRUE)
imports
clean_trade %>%
filter(geo %in% c("Quebec", "Ontario", "Manitoba", "Alberta", "British Columbia"
)) %>%
filter(principal_trading_partners %in% c("All countries", "United States",
"China")) %>%
filter(north_american_product_classification_system_napcs != "Total of all merchandise") %>%
ggplot() +
aes(x = ref_date, y = import, colour = north_american_product_classification_system_napcs) +
geom_line(size = 0.7) +
scale_color_hue() +
theme_minimal() +
facet_grid(vars(geo), vars(principal_trading_partners)) -> P
plotly::ggplotly(P,dynamicTicks = TRUE)
annualized trade; run this in mid feb to make sure all the months are there for a reference year. doing so earlier will result in the most recent year being underepesented the best choice is to use moving average or an estimated value below is the raw trade data from NDM; it comes monthly and with a lot of data I filter out most of it for this provincial view
# make sure you run this after the december release; 8 weeks after reference period is the release
library(tidyverse)
library(tsibble)
library(lubridate)
Attaching package: 'lubridate'
The following object is masked from 'package:tsibble':
interval
The following objects are masked from 'package:base':
date, intersect, setdiff, union
trade <- cansim::get_cansim_ndm(trade_table_number)
Reading CANSIM NDM product 12-10-0119 from cache.
trade <- janitor::clean_names(trade)
trade %>%
select(ref_date, geo, trade, north_american_product_classification_system_napcs, principal_trading_partners, value) %>%
mutate(ref_date = parse_date(ref_date, "%Y-%m")) %>%
filter(ref_date > from_this_year) %>%
pivot_wider(names_from = trade, values_from = value) %>%
janitor::clean_names() -> clean_trade
data_tsbl <- as_tsibble(clean_trade, key = c(north_american_product_classification_system_napcs, principal_trading_partners, geo))
Using `ref_date` as index variable.
data_tsbl %>%
group_by_key() %>%
index_by(year_month = ~ year(.)) %>%
summarise(
import = sum(import, na.rm = TRUE),
export = sum(domestic_export, na.rm = TRUE)
) -> annual_trade
# saving the CSV {} between these things the program will put in R code
annual_trade %>% data.table::fwrite(here::here("output", str_glue("annual_trade_table_{trade_table_number}.csv")))
as_tibble(annual_trade) %>%
filter(principal_trading_partners == "All countries") %>%
filter(north_american_product_classification_system_napcs == "Total of all merchandise") %>%
rename(ref_date = year_month) %>%
mutate(ref_date = lubridate::make_date(ref_date)) %>%
select(geo, ref_date, import, export) -> master_trade
left_join(clean_population, master_trade) %>% left_join(master_pgdp) -> master_provincial_table
Joining, by = c("ref_date", "geo")
Joining, by = c("ref_date", "geo")
Immigration <- cansim::get_cansim_ndm(immigration_table_number)
Accessing CANSIM NDM product 17-10-0008 from Statistics Canada
Parsing data
Folding in metadata
Immigration <- janitor::clean_names(Immigration)
Immigration %>%
select(ref_date, geo, components_of_population_growth, value) %>%
mutate(ref_date = parse_date(ref_date, "%Y /%Y")) %>%
filter(ref_date > from_this_year) %>%
pivot_wider(names_from = components_of_population_growth, values_from = value) %>%
group_by(geo) %>%
clean_names() -> clean_immigration
master_provincial_table %>%
left_join(clean_immigration) -> master_provincial_table
Joining, by = c("ref_date", "geo")
all construction numbers are for total type of work and are in unadjusted K dollar mabye I should just use 23A_K_ind
construction <- cansim::get_cansim_ndm(building_construction_table_number)
Accessing CANSIM NDM product 34-10-0175 from Statistics Canada
Parsing data
Folding in metadata
construction <- janitor::clean_names(construction)
construction %>%
select(ref_date, geo, type_of_structure, type_of_work, investment_value, value) %>%
mutate(ref_date = parse_date(ref_date, "%Y-%m")) %>%
filter(ref_date > from_this_year) %>%
filter(geo %in% provinces) %>%
filter(type_of_work == "Types of work, total") %>%
select(!type_of_work) %>%
filter(investment_value %in% c("Unadjusted - constant")) %>%
select(!investment_value) %>%
filter(type_of_structure %in% c("Total residential and non-residential", "Total residential", "Total non-residential")) %>%
pivot_wider(names_from = type_of_structure, values_from = value) %>%
clean_names() %>%
rename_if(is.numeric, funs(str_c(., "_investment_unadjusted_K$"))) -> clean_construction
Warning: `funs()` is deprecated as of dplyr 0.8.0.
Please use a list of either functions or lambdas:
# Simple named list:
list(mean = mean, median = median)
# Auto named with `tibble::lst()`:
tibble::lst(mean, median)
# Using lambdas
list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
data_tsbl <- as_tsibble(clean_construction, key = c(geo))
Using `ref_date` as index variable.
data_tsbl %>%
group_by_key() %>%
index_by(year_month = ~ year(.)) %>%
summarise(across(where(is.numeric),list( a = sum))) -> master_construction
master_construction %>%
as_tibble() %>% rename(ref_date = year_month) %>%
mutate(ref_date = lubridate::make_date(ref_date)) %>%
select(everything()) -> master_construction
master_provincial_table <- left_join(master_construction,master_provincial_table)
Joining, by = c("geo", "ref_date")
# saving the CSV {} between the
these other two tables that I am not sure if I will include
here in this section I will ad prices
cpi <- cansim::get_cansim_ndm(cpi_table_number)
Accessing CANSIM NDM product 18-10-0005 from Statistics Canada
Parsing data
Folding in metadata
cpi <- janitor::clean_names(cpi)
cpi %>%
select(ref_date,geo,products_and_product_groups,uom,value) %>%
mutate(ref_date = parse_date(ref_date, "%Y")) %>%
filter(ref_date > from_this_year) %>%
filter(uom == "2002=100") %>%
select(!uom) %>%
filter(geo %in% provinces) -> temp_CPI
temp_CPI %>%
data.table::fwrite(here::here("output", str_glue("CPI_{cpi_table_number}.csv")))
interactive_table(temp_CPI)
temp_CPI %>%
filter(products_and_product_groups %in% c("All-items", "Goods","Services")) %>%
pivot_wider(names_from = products_and_product_groups, values_from = value) -> clean_cpi
master_provincial_table <- left_join(master_provincial_table,clean_cpi)
Joining, by = c("geo", "ref_date")
producer prices
LFS_annual <- cansim::get_cansim_ndm(annual_lfs_table_number)
Accessing CANSIM NDM product 14-10-0023 from Statistics Canada
Parsing data
Folding in metadata
LFS_annual <- janitor::clean_names(LFS_annual)
LFS_annual %>%
select(everything()) %>%
mutate(ref_date = parse_date(ref_date, "%Y")) %>%
filter(ref_date > from_this_year) %>%
filter(age_group == "15 years and over") %>%
filter( sex == "Both sexes" ) %>%
select(ref_date,geo,labour_force_characteristics,north_american_industry_classification_system_naics,value) %>%
pivot_wider(names_from = labour_force_characteristics, values_from = value) %>% interactive_table()
clean_trade$ref_date[1]
[1] "2017-01-01"
14-10-0037-01
LFS_hoursworked <- cansim::get_cansim_ndm("14-10-0037-01")
Accessing CANSIM NDM product 14-10-0037 from Statistics Canada
Parsing data
Folding in metadata
LFS_hoursworked <- janitor::clean_names(LFS_hoursworked)
LFS_hoursworked %>%
select(ref_date, geo, actual_hours_worked, class_of_worker, north_american_industry_classification_system_naics, sex, value) %>%
mutate(ref_date = parse_date(ref_date, "%Y")) %>%
filter(ref_date > from_this_year) %>%
filter(class_of_worker == "Total employed") %>%
filter(sex == "Both sexes") %>%
pivot_wider(names_from = actual_hours_worked, values_from = value) %>%
select(!class_of_worker) %>%
select(!sex) %>%
mutate(across(where(is.character), as.factor)) %>%
arrange(north_american_industry_classification_system_naics, geo, ref_date) %>%
group_by(north_american_industry_classification_system_naics, geo) %>%
mutate(across(where(is.numeric), fd_pc, .names = "{.fn}.{.col}")) %>%
mutate(across(where(is.numeric), round, 3)) %>%
select(everything()) %>%
interactive_table()
survey for this is SEPH
this needs to be run by mid feb
employment_by_industury <- cansim::get_cansim_ndm(employment_by_industury_table_number_annual)
Accessing CANSIM NDM product 14-10-0202 from Statistics Canada
Parsing data
Folding in metadata
employment_by_industury <- janitor::clean_names(employment_by_industury)
employment_by_industury %>%
select(ref_date,geo,type_of_employee,north_american_industry_classification_system_naics,value,status) %>%
mutate(ref_date = parse_date(ref_date, "%Y")) %>%
filter(ref_date > from_this_year) %>%
pivot_wider(names_from = type_of_employee, values_from = c(value,status)) %>%
mutate(across(where(is.character), as.factor)) %>%
arrange(north_american_industry_classification_system_naics, geo, ref_date) %>%
group_by(north_american_industry_classification_system_naics, geo) %>%
mutate(across(where(is.numeric), fd_pc, .names = "{.fn}.{.col}")) %>%
mutate(across(where(is.numeric), round, 3)) %>% head()
# A tibble: 6 x 15
# Groups: north_american_industry_classification_system_naics, geo [2]
ref_date geo north_american_~ `value_All empl~ `value_Salaried~
<date> <fct> <fct> <dbl> <dbl>
1 2017-01-01 Albe~ Aboriginal publ~ 6574 NA
2 2018-01-01 Albe~ Aboriginal publ~ 6555 NA
3 2019-01-01 Albe~ Aboriginal publ~ 6941 NA
4 2017-01-01 Brit~ Aboriginal publ~ 10466 NA
5 2018-01-01 Brit~ Aboriginal publ~ 10725 NA
6 2019-01-01 Brit~ Aboriginal publ~ 11265 NA
# ... with 10 more variables: `value_Employees paid by the hour` <dbl>,
# `status_All employees` <fct>, `status_Salaried employees paid a fixed
# salary` <fct>, `status_Employees paid by the hour` <fct>, `fd.value_All
# employees` <dbl>, `pc.value_All employees` <dbl>, `fd.value_Salaried
# employees paid a fixed salary` <dbl>, `pc.value_Salaried employees paid a
# fixed salary` <dbl>, `fd.value_Employees paid by the hour` <dbl>,
# `pc.value_Employees paid by the hour` <dbl>
Job Vacancy and W
focusing on agric/aquaculture
grain_shipment <- cansim::get_cansim_ndm(grain_shipment_table_number)
Accessing CANSIM NDM product 32-10-0351 from Statistics Canada
Parsing data
Folding in metadata
grain_shipment <- janitor::clean_names(grain_shipment)
grain_shipment %>%
select(ref_date,geo,type_of_grain,value) %>%
mutate(ref_date = parse_date(ref_date, "%Y-%m")) %>%
filter(ref_date > from_this_year) %>%
pivot_wider(names_from = type_of_grain, values_from = value) %>%
filter(geo %in% provinces) %>%
as_tsibble(key = c(geo)) %>%
group_by_key() %>%
index_by(year_month = ~ year(.)) %>%
summarise(across(where(is.numeric),list(sum))) %>%
as_tibble() %>%
rename(ref_date = year_month) %>%
mutate(ref_date = lubridate::make_date(ref_date)) %>%
select(everything()) -> master_annual_grain_shipment
Using `ref_date` as index variable.
master_annual_grain_shipment %>% data.table::fwrite(here::here("output", str_glue("grain_production {grain_shipment_table_number}.csv")))
master_provincial_table <- left_join(master_provincial_table,master_annual_grain_shipment)
Joining, by = c("geo", "ref_date")
master_provincial_table %>% data.table::fwrite(here::here("output", str_glue("provincial_view.csv")))
master_provincial_table %>%
mutate(across(where(is.character), as.factor)) %>%
mutate(ref_date = year(ref_date)) %>%
mutate(ref_date = as.factor(ref_date)) %>%
rename_with(~ tolower(gsub("_", " ", .x, fixed = TRUE))) %>%
filter(geo != "Canada") %>%
interactive_table() %>%
formatCurrency(3:5, digits = 0)%>%
formatStyle(
6,
background = styleColorBar(master_provincial_table$population, 'teal'),
backgroundSize = '100% 90%',
backgroundRepeat = 'no-repeat',
backgroundPosition = 'center'
) %>%
formatStyle(
7,
background = styleColorBar(master_provincial_table$import, 'steelblue'),
backgroundSize = '100% 90%',
backgroundRepeat = 'no-repeat',
backgroundPosition = 'center'
) %>%
formatStyle(
8,
background = styleColorBar(master_provincial_table$export, 'red'),
backgroundSize = '100% 90%',
backgroundRepeat = 'no-repeat',
backgroundPosition = 'center'
)
year 2020 only
master_provincial_table %>%
mutate(across(where(is.character), as.factor)) %>%
mutate(ref_date = year(ref_date)) %>%
filter(ref_date == 2020) %>%
mutate(ref_date = as.factor(ref_date)) %>%
rename_with(~ tolower(gsub("_", " ", .x, fixed = TRUE))) %>%
filter(geo != "Canada") %>%
interactive_table() %>%
formatCurrency(3:5, digits = 0)%>%
formatStyle(
6,
background = styleColorBar(master_provincial_table$population, 'teal'),
backgroundSize = '100% 90%',
backgroundRepeat = 'no-repeat',
backgroundPosition = 'center'
) %>%
formatStyle(
7,
background = styleColorBar(master_provincial_table$import, 'steelblue'),
backgroundSize = '100% 90%',
backgroundRepeat = 'no-repeat',
backgroundPosition = 'center'
) %>%
formatStyle(
8,
background = styleColorBar(master_provincial_table$export, 'red'),
backgroundSize = '100% 90%',
backgroundRepeat = 'no-repeat',
backgroundPosition = 'center'
)
maybe use * in regex to match wild cards with the coordinates; for example 1.1.** *
use this mutate_at(vars(births:residual_deviation), list(~ .x - lag(.x))) to automatically create a first difference of any table.
rename_with(iris, ~ tolower(gsub(“.”, “_”, .x, fixed = TRUE)))
created a schema for naming things; I should have followed the statcan names instead of naming things for simplicity. create code templates with text generations based on three scenarios annual tables weekly tables monthly tables
have a to do list
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252
[3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.7.9.2 tsibble_0.9.3 here_1.0.1 cansim_0.3.5
[5] janitor_2.0.1 DT_0.16 forcats_0.5.0 stringr_1.4.0
[9] dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2
[13] tibble_3.0.4 ggplot2_3.3.2 tidyverse_1.3.0 workflowr_1.6.2
loaded via a namespace (and not attached):
[1] httr_1.4.2 jsonlite_1.7.2 viridisLite_0.3.0 modelr_0.1.8
[5] assertthat_0.2.1 cellranger_1.1.0 yaml_2.2.1 pillar_1.4.7
[9] backports_1.2.1 glue_1.4.2 digest_0.6.27 promises_1.1.1
[13] rvest_0.3.6 snakecase_0.11.0 colorspace_2.0-0 htmltools_0.5.0
[17] httpuv_1.5.4 pkgconfig_2.0.3 broom_0.7.3 haven_2.3.1
[21] scales_1.1.1 whisker_0.4 later_1.1.0.1 git2r_0.27.1
[25] generics_0.1.0 farver_2.0.3 ellipsis_0.3.1 gapminder_0.3.0
[29] withr_2.3.0 lazyeval_0.2.2 cli_2.2.0 magrittr_2.0.1
[33] crayon_1.3.4 readxl_1.3.1 evaluate_0.14 ps_1.5.0
[37] fs_1.5.0 fansi_0.4.1 anytime_0.3.9 xml2_1.3.2
[41] tools_4.0.3 data.table_1.13.4 hms_0.5.3 lifecycle_0.2.0
[45] plotly_4.9.2.2 munsell_0.5.0 reprex_0.3.0 compiler_4.0.3
[49] rlang_0.4.9 grid_4.0.3 rstudioapi_0.13 htmlwidgets_1.5.3
[53] crosstalk_1.1.0.1 labeling_0.4.2 rmarkdown_2.6 gtable_0.3.0
[57] DBI_1.1.0 curl_4.3 R6_2.5.0 knitr_1.30
[61] utf8_1.1.4 rprojroot_2.0.2 stringi_1.5.3 Rcpp_1.0.5
[65] vctrs_0.3.6 dbplyr_2.0.0 tidyselect_1.1.0 xfun_0.19